-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCT: Derived memh #10332
base: master
Are you sure you want to change the base?
UCT: Derived memh #10332
Conversation
src/uct/ib/mlx5/dv/ib_mlx5dv_md.c
Outdated
base->atomic_dvmr = NULL; | ||
base->atomic_rkey = UCT_IB_INVALID_MKEY; | ||
base->indirect_dvmr = NULL; | ||
base->indirect_rkey = UCT_IB_INVALID_MKEY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to invalidate basic memh keys? I'd expect that this will break (at least performance) of some (at least atomics) operations on the base key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should not hinder performance, because base memh rkeys will be recreated on next mkey_pack, like is done in my test.
But you are right, strictly speaking it's NOT necessary to transfer rkeys ownership to the derived memh. It will work even without this transfer. I will remove this ownership transfer logic
The reason why I added this transfer is the following:
Currently it's UCT layer that decides whether to create rkeys (atomic + indirect), and conditions for them are different:
- atomic rkey is created if user requested it (with
UCT_MD_MEM_ACCESS_REMOTE_ATOMIC
), and hardware supports it. So it might be created no matter if invalidation is needed - indirect rkey is created only when invalidation is needed
Now from UCP layer we assume that we will create a memory window when invalidation if needed from EP config. By this time atomic rkeys on base memh might be already created and I thought it's a good idea to move it to the memory window, to avoid re-creating this key per MW.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed ownership transfer for now, let's consider this later
test/gtest/uct/ib/test_ib_md.cc
Outdated
|
||
/* Test case 4: base memh can still be used to pack mkeys */ | ||
std::vector<uint8_t> base_rkey2 = mkey_pack(base, flags); | ||
EXPECT_NE(base_rkey1, base_rkey2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i do not think it is correct. base memh should not be modified once derived memh is created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, fixed this
test/gtest/uct/ib/test_ib_md.cc
Outdated
uct_mem_h mw1 = reg_mem_window(base); | ||
|
||
/* Test case 2: creating MW from memh after mkey_pack */ | ||
std::vector<uint8_t> base_rkey1 = mkey_pack(base, flags); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need invalidation on first mkey pack of base?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need test with packing without invalidation first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of UCT invalidation would mean that we destroy derived memh. Currently I don't have tests with invalidation, actually it's a good point to add them.
Or I misunderstood your comment
src/uct/cuda/cuda_ipc/cuda_ipc_md.c
Outdated
@@ -372,9 +372,17 @@ static ucs_status_t | |||
uct_cuda_ipc_mem_reg(uct_md_h md, void *address, size_t length, | |||
const uct_md_mem_reg_params_t *params, uct_mem_h *memh_p) | |||
{ | |||
uct_mem_h base = (params != NULL) ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
params
is not checked to be non-NULL in other places. I think we can remove this check here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, initially I didn't have this check, added it to fix NULL pointer crash in CI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you remember the failed test? Looks like an issue in caller function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember it, but I just see my commit named "Fix NPE" that I've made to address that failure: 53a4b08
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, found and fixed this NULL pointer error in gtest: b193dea
* A pointer to an existing memory handle. | ||
* Used to register a derived memh: a shallow copy of an existing UCT memh | ||
* which can be used to access the same memory region. When created, the | ||
* derived memh inherits the access flags and the state of the original |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we modify the access flags when creating a memory window? i would expect yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not aware of this use case.
Do you propose to update this documentation? Or extend derived memh API so that we can create it with alternate access rights?
src/uct/cuda/cuda_ipc/cuda_ipc_md.c
Outdated
@@ -375,6 +375,10 @@ uct_cuda_ipc_mem_reg(uct_md_h md, void *address, size_t length, | |||
uct_cuda_ipc_memh_t *memh; | |||
CUdevice cu_device; | |||
|
|||
UCT_CHECK_PARAM((params == NULL) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO no need to check (params == NULL)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment I got from Raul)
I added this check intentionally, because some unit test was triggering this call with NULL param.
Ok, let me remove it, so we see exactly that test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, found and fixed this NULL pointer error in gtest: b193dea
src/uct/cuda/cuda_ipc/cuda_ipc_md.c
Outdated
@@ -375,6 +375,10 @@ uct_cuda_ipc_mem_reg(uct_md_h md, void *address, size_t length, | |||
uct_cuda_ipc_memh_t *memh; | |||
CUdevice cu_device; | |||
|
|||
UCT_CHECK_PARAM((params == NULL) || | |||
UCT_MD_MEM_REG_FIELD_VALUE(params, memh, FIELD_MEMH, NULL) == NULL, | |||
"CUDA IPC does not support derived memory handles"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we make this a common macro and use in other memory domains that dont support deriver memh?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can
I think we can use this new macro only in those MDs which advertise INVALIDATION support, but do not properly handle it. These are namely: cuda_ipc_md and cma_md
src/uct/ib/base/ib_md.c
Outdated
unsigned mem_flags, size_t memh_base_size, | ||
size_t mr_size, uct_ib_mem_t **memh_p) | ||
static uct_ib_mem_t * | ||
uct_ib_memh_alloc_internal(uct_ib_md_t *md, size_t memh_base_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make better names, instead of uct_ib_memh_alloc_internal+uct_ib_memh_alloc - uct_ib_memh_alloc+uct_ib_memh_new/init
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uct_ib_memh_alloc
is an existing function, so we don't want to change its name
I just extracted some common part from uct_ib_memh_alloc
into _internal so that it can be reused by uct_ib_memh_clone
.
Maybe we name it uct_ib_memh_alloc_common
?
src/uct/ib/base/ib_md.c
Outdated
memh = uct_ib_memh_alloc_internal(md, memh_base_size, mr_size, &memh_size); | ||
if (memh == NULL) { | ||
return UCS_ERR_NO_MEMORY; | ||
} | ||
|
||
memcpy(memh, src, memh_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- clone seems wrong - if the derived memh is shallow, why need to fully copy the original memh?
- seems weird that we use calloc() and then override everything with memcpy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answering in the opposite order:
2. Right, here malloc would be enough but I'm reusing existing function uct_ib_memh_alloc_internal
that does calloc
and calculates the size, just to reuse the common code. I think the overhead is minimal.
We could split size calculation into a separate function etc, but I think it's not worth the effort
- Ok, maybe it's a terminology issue here
My understanding is:
Deep copy creates an independent instance of an object, that can be used apart from the original.
Shallow copy creates an "alias" that depends on master copy and cannot be used separately.
The latter is what's implemented in this PR, whatever we name it.
Derived memh is a shallow copy, because it makes a shallow copy of MRs - the most important part of original memh. And original object remains the only owner of the MRs state.
Of course there could be different implementations of derived memh. For example, we can imagine a shallow copy looking like that:
struct {
uct_ib_mlx5_devx_mem_t *base;
// derived specific fields
} uct_derived_memh;
Looks ok, right? Still you need to allocate this object.
What are the issues with this approach:
- There are places where we assume that
memh
has always theuct_ib_mem_t
base: uct_rc_mlx5_txqp_tag_inline_post: ((uct_ib_mem_t*)iov->memh)
So we must adduct_ib_mem_t super
field to our copy - The copy must handle a set of rkeys independently from the original memh, meaning that the following fields also needs to be copied:
struct mlx5dv_devx_obj *atomic_dvmr;
struct mlx5dv_devx_obj *indirect_dvmr;
uint32_t atomic_rkey;
uint32_t indirect_rkey;
This is just to show you that we need to duplicate the significant part of the original memh anyway.
- Then in each and every place we need to check whether passed memh is derived or original (because their layout is different), adding a lot of boilerplate code and CPU overhead..
- Then we need to modify quite some existing functions for key generation, because they cache keys in the original object.
To overcome all these issues I make a shallow copy of the original memh (== they have the same memory layout), and use it interchangeably in all the places. This allows me to minimize the amount of ifs in the code, basically we check for derived handle only during init and cleanup. This helps to avoid any refactoring in the existing functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we nee to brainstorm the approaches. I don't like having structs where only some fields used, maybe need a deeper refactoring in the IB memh structure.
src/uct/api/v2/uct_v2.h
Outdated
/** | ||
* Memory domain supports derived memory handle registration. | ||
*/ | ||
UCT_MD_FLAG_DERIVED = UCS_BIT(13) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UCT_MD_FLAG_REG_DERIVED
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/uct/ib/mlx5/dv/ib_mlx5dv_md.c
Outdated
const uct_ib_mlx5_devx_mem_t *src, | ||
uct_ib_mlx5_devx_mem_t **memh_p) | ||
{ | ||
size_t mr_size = src->super.flags & UCT_IB_MEM_IMPORTED ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
( )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
What?
This PR is a composite part of the new memory invalidation design based on the concept of memory windows.
Here we introduce changes on the UCT side.
On UCT side memory window represents a shallow copy of some existing UCT memory handle, which can be used to access the same memory region. When created, the memory window inherits the original memh access flags and state, and takes ownership of the indirect keys of the original memory handle. The lifetime of the memory window is bound to the original memh, and the original memh cannot be destroyed until all its memory windows are destroyed.
Design doc
Why?
The overall idea is to employ a lightweight invalidation: when failure happens we invalidate only the exposed indirect rkeys, but don't deregister the entire memory region, which is an expensive operation. Lightweight invalidation enables invalidation workflow for RNDV pipeline protocol.
How?
uct_md_mem_reg_params_t
:UCT_MD_MEM_WINDOW
. Existingmem_reg
call will create a memory window based on existing UCT memh